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ABSTRACT ' 

A study investigated possible reasons for the low performance 
in 2000 on the writing portion of the Delaware Student Testing Program (DSTP) 
by students., especially in grades 3 and 5. The study also investigated ways 
to improve classroom instruction in writing. A panel bf teachers reviewed the 
anchor papers and the process of testing. Panel members re-scored the anchor 
papers. A second panel of teachers participated in a re-scoring of a sample 
of 100 text-based student writings per grade {without using anchor papers) . 
Panel members also discussed related issues in test administration, test 
development, scoring, and classroom instruction. Results include: (1) two 

text-based writing prompts should not be given on the same day; (2) 
instructions should be written to draw students’ attention; (3) the new 
scores and the original scores on anchor papers were highly consistent in 
grades 3 and 5, and moderately high in grade 8 and grade 10; (4) passages 

should be engaging and the difficulty level should be consistent from year to 
year; (5) wording in the prompt should always direct students back to the 
text; (6) tenth grade teachers pointed out that most of the writing done by 
high school students is text-based, and that text-based writing is not a 
separate type of writing; (7) fifth grade teachers suggested that committees 
or teams develop questioning activities for teachers • to use to improve 
students' performance on text-based writing; and (8) grade patterns for 
correlation coefficients were identified, with grade 3 being the lowest and 
grade 10 the highest. Changes planned for the Spring 2001 writing assessment 
include: the range for. the total writing scores will be 1 to 15; two text- 
based writing tasks will be administered on different days; a prewriting 
sheet and scratch paper will be provided; and the text-based writing prompts 
will be formatted closer to the stand-alone prompts. Contains 6 tables of 
data. Appendixes contain the writing scoring rubric and a correlation matrix 
between reading and writing. (RS) 
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Special Writing Study Report 



Introduction 

The objective of the Delaware Student Testing Program (DSTP) is to measure student 
progress toward the Delaware Content Standards. Each spring, all public school students 
in grades 3, 5, 8, and 10 take the statewide assessment in reading, writing, and 
mathematics. The writing assessment consists of a text-based writing task and a stand- 

alone writing prompt. The text- based writing task links b a passage in the DSTP reading 
assessment and students' responses to this task are scored twice, once for a reading score 
and once for a writing score. Both stand-alone and text-based writings are untimed. The 
stand-alone writing usually takes approximately 2 hours, including a pre- writing session, 
a first draft, and a final draft. Only the final draft of this prompt is scored. 

A 5-point scoring rubric (Please see Attachment A) is used to score both the text-based 
and stand-alone responses. One reader scores the text-based writing; two readers score 
the stand-alone writing. The lowest score for the text-based writing is 1 and the highest 
possible score is 5; the lowest score for the stand-alone writing is 2 and the highest 
possible score is 10. The total writing raw score is the sum of the text- based writing 
score and the stand-alone writing score. Thus, the score range currently is from a low of 
3 to a high of 15. 

Over the past three years, the overall writing scores have declined in grades 3 and 5, 
remained steady in grade 8, and increased slightly in grade 10 (See Tables la -Id). The 
average performance on the stand-alone writing shows a consistent pattern of increase 
across years for students in grades 8 and 10; minor fluctuations over time for Judents in 
grades 3 and 5. Student performance on the text-based writing, however, dropped to the 
lowest level in 2000 for all grades except grade 10. Because of the drop in text-based 
writing scores, the Assessment and Analysis Group decided to conduct a special writing 
study to investigate the possible reasons for the low performance in 2000, especially in 
grades 3 and 5. 

Purpose of the Study 

The primary purposes of this study were (1) to investigate the possible reasons for the 
low performance on the text-based writing in 2000, especially in grades 3 and 5 and (2) 
to investigate ways to improve classroom instruction in writing. 

Methods of the Study 

General Design Due to the time constraints and the availability of information/data, this 
study focused on the following five aspects: 

• Review the test process of testing (i.e., review of test administration and testing 
materials); 
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• Review text-based writing scores (i.e., review anchor papers and re-score a 
sample of students' responses to the text-based writing task); 

• Examine construct validity evidence (i.e., review available data and conduct 
additional statistical analyses); 

• Make recommendations for the development of text-based writing tasks; and 

• Make recommendations on ways to improve classroom instruction in writing. 

This study was conducted in two parts. In part one, a panel of teachers reviewed the 
anchor papers and the process of testing. Anchor papers are a sample of students' 
writings that are used as benchmarks in scoring and represent score points on the rubric. 
In this study, the panel members re- scored the anchor papers of a given grade 
independently and worked in a small group to discuss and finalize their scores. In part 
two, a second panel of teachers participated in re- scoring session for a sample of 100 
text-based writings per grade. They scored students' writings holistically and analytically 
using the 5-point scoring rubric. Since the anchor papers were under review, the re- 
scoring was conducted without using anchor papers. Each writing sample was evaluated 
by up to 5 teachers. The panel members then discussed related issues in test 
administration, test development, scoring, and classroom instruction. 

Sample o£_ Student Writings A random sample of 100 student responses to the text- based 
writing task was selected from the population of each grade for re- scoring in this study. 

Panels of_ Teachers^ Two panels of teachers were invited to participate in this study, one 
for anchor paper review and one for the re-scoring session. These teachers were selected 
based on their expertise in writing, teaching experience, experience in the development of 
writing assessment and scoring, familiarity with the Delaware Content Standards in 
English language arts and the writing scoring rubric, geographic location, and 
availability. 

The Anchor Paper Review Panel consisted of 9 members. Seven of the panel members 
(78%) have served on the test development committees and 2 (22%) were involved in the 
anchor paper pulling for the 2000 DSTP writing assessment. 

The Re- Scoring Panel included 22 members. Nearly half have served on the test 
development committees and about a quarter were involved in the anchor paper pulling 
for the 2000 DSTP writing assessment. 

Data Analysis qnd_ Summary of_ Comments To investigate the possible reasons for the low 
performance on the text-based writing in 2000, teachers reviewed, discussed, and made 
recommendations for improving test administration, test development, scoring, and text- 
based writing instruction. These comments are reported in the “Results of the Study”. 
The results of data analyses are presented in tables and charts. Data analyses for this 
study include: 

• Three- year comparisons of statistics of writing scores by grade 
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• Correlation analysis of all types of writing scores and reading scores by grade 
for 2000 DSTP 

Results of the Study 

The results of the study are reported in five categories, test administration, re- scoring 
anchor papers, text-based writing development, text-based writing instruction, and 
construct validity evidence. 

Test Administration Both panels reviewed the 2000 DSTP Directions for Administering 
the Test and test booklets and compared those directions with the previous years' testing 
materials and the process of testing. Suggestions for changes include: 

1. Two text- based writing prompts, one for field test and one for operational test, 
should not be given on the same day, especially for younger students. 

2. The text-based writing should be given in the beginning of the reading test rather 
than as the last item of the day. 

3. The instructions for the text- based writing should be written to draw students' 
attention, such as bolded for emphasis, using separate pages to ensure that 
students understand this item will be scored twice for both reading and writing. 

4. The text-based writing task should be formatted similar to the stand-alone 
writing prompt, such as using pre- writing. 

Re-scorins Anchor Papers To examine the accuracy of scoring, the anchor papers were 
reviewed and re- scored by the first panel. The results of re- scoring anchor papers show 
that the new scores and the original scores are highly consistent in grades 3 and 5, and 
moderately high in grade 8 and grade 10. 

TextBased Writing Development During group discussions, teachers provided comments 
and suggestions related to the development of the text- based writing. Their comments 
focused on three major issues: passage selection, wording of the prompt, and use of the 
writing rubric. 

• Passage Selection: Passages should be engaging and the difficulty level should be 

consistent from year to year. Third grade teachers preferred realistic stories as the 
basis for the text- based writing. Fifth grade teachers thought passages should be 
informative selections dealing with social studies or science. 

• Wording of the Prompt: The wording in the prompt should always direct the 

students back to the text so that information from the text is included in the 
response. “Use details from the text to support your answer,” should be in all 
prompts. Students should understand the concepts implied in the wording of the 
prompt. Developers should take care in using “user accessible” language in 
writing the prompts. 
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• Use of the Writing Rubric: Teachers discussed the possibility of adapting the 

general writing rubric so that each text- based writing item would have an item 
specific writing rubric. With guidelines provided in these item specific rubrics, it 
would be easier for scorers to determine when a student's writing is off topic. 

Text-based Writing Instruction Teachers' comments related to instructional issues of the 
text-based writing focused on professional development in crafting text- based prompts 
and on identifying a variety of reading passages with which to write such prompts. They 
emphasized the need to have students write in response to a variety of text types (literary, 
informative, and technical) across content areas, and for teachers to model the process of 
making connections to the text and pulling out relevant details. 

• Tenth grade teachers pointed out that most of the writing done by high school 
students is text-based, and that text- based writing is not a separate type of writing. 
Written responses to texts are produced as forms of persuasive, expressive, or 
informative writing. High school teachers also expressed a concern regarding 
block scheduling, where students may have only five weeks of instruction prior to 
the administration of the DSTP. Finally, tenth grade teachers suggested that high 
school English teachers have gone away from literary analysis in lieu of an 
emphasis on stand-alone writing prompts, which may sacrifice students’ writing 
in response to text. 

• Fifth grade teachers stressed the importance of students making connections with 

characters in a story. They suggested that grade- level teams or district 

committees (led by reading cadre representatives) develop questioning activities 
for teachers to use to improve students’ performance on text-based writing. They 
also pointed out the need to release sample student responses to text- based writing 
prompts. 

Construct Validity Evidence The statistics of the three writing scores, text-based, stand- 
alone, and the writing total raw scores are compared by grade for 1998, 1999, and 2000 
(See Table 1). 

As indicated earlier in this report, the text-based writing tasks attach to a passage in the 
DSTP reading assessment. This passage includes several multiple-choice (MC) and 
constructed- response (CR) items. The student's response to one of the CR items was 
scored as part of the reading score using the reading scoring rubric and the text-based 
writing score using the writing rubric. 

Tables 2a and 2b present the correlation coefficients among five reading scores, three 
writing scores, and the SAT9 reading comprehension test scores by grade from the 2000 
DSTP. The analyses are based on the following eight variables: 

• MClTEM: The multiple-choice item score is the sum of scores on all MC items 
attached to the reading passage 
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• CRITEM: The constructed- response item score is the sum of scores on all 

constructed- response items attached to the reading passage 

• PASSAGE: The passage score is the sum of scores on all MC and CR items 

attached to the reading passage 

• IREADING: The reading item score is the score on the extended constructed- 

response item (the same item was scored for the text-based writing ‘TEXT’) 

• TEXT: The text- based writing score is the score on the extended constructed- 

response item (the same item was scored for reading ‘IREADING’) 

• PROMPT: The writing score on the stand-alone writing prompt 

• WRITING: The total writing raw score that is the sum of the text-based and 

stand-alone writing scores 

• READING: The DSTP reading score 

• SAT9: The reading score on the 30- item SAT9 reading comprehension test 

The results show that the correlation coefficients between the text- based writing scores 
(TEXT) and the item reading scores (IREADING) from the same CR items are .22 for 
grade 3, .45 for grade 5, .57 for grade 8, and .60 for grade 10. First, the statistics indicate 
a grade pattern, from the lowest value of the correlation coefficient in grade 3 to the 
highest value in grade 10. Second, the low correlation in grade 3 suggests that only 5% 
of the variance from one score associates with the other score; in grade 5, about 20% of 
the variance from one score associates with the other score. The correlations between the 
text- based writing scores (TEXT) and the scores on the MC items (MCITEM) and the CR 
items (CRITEM) from the reading passage, and the passage scores (PASSAGE) are .19, 
.31 and .31 in grade 3, which is the lowest among the four grades. Again, the low 
correlation in grade 3 suggests that only 4% to 10% of the variance of text- based writing 
scores is associated with the MC item scores, CR item scores, and the passage scores, 
respectively. Similarly, the correlations between the text- based writing score and the 
scores on MC items (MCITEM) and CR items (CRITEM) from the reading passage, and 
the score of the reading passage (PASSAGE) are .26 to .44 in grade 5, which indicate that 
7% to 19% of the variance from the text-based writing scores can be accounted for by the 
scores from reading. 

The correlations between the text- based writing (TEXT) and stand-alone writing score 
(PROMPT) range from .36, .41, .41, and .48 for grades 3, 5, 8, and 10, respectively. A 
grade pattern is observed, where the correlation coefficient for grade 3 is the lowest 
among the four grades. The low and moderately low correlations across grades suggest 
that about 13% to 23% of the variance from the text- based writing scores is associated 
with the stand-alone writing scores. Statistics appear to suggest that the text- based 
writing measures different types of writing skills or different constructs from the stand- 
alone writing. 

The correlations between SAT9 reading scores and the DSTP reading scores 
(READING) are stable across grades, ranging from .84 to .86, and no grade pattern is 
found. Moreover, the sizes of the correlation coefficients between the SAT9 reading and 
the stand-alone writing scores PROMPT) are very close, ranging from .41 to .48 across 
grades without a grade pattern. The correlations between the SAT9 reading and the text- 




9 



based writing scores (TEXT), however, show a grade pattern with the lowest coefficient 
in grade 3 (r=.33) and the highest coefficient in grades 8 and 10 (r=.48). 

The correlation matrix among different types of writing and reading scores for the 1998 
and 1999 DSTP provides additional information for the construct validity (See 
Attachment B). The correlation coefficients between reading and writing scores are 
consistent in 1998 and 1999. The correlations between text-based writing and reading 
scores are higher in 1998 and 1999 (r= .56 in 1998 and r=.60 in 1999 for grade 3; r=.60 in 
1998 and r=.56 in 1999 for grade 5) than that in 2000 (r=.33 in grade 3; r=.44 in grade 5). 
In grade 3, the correlation between text-based and stand-alone writings is lower in 2000 
(r=.36) than the previous years (r=.45 in 1998; r=.46 in 1999). The correlation between 
text- based writing and reading scores also shows the lowest value in 2000 for grade 3 
(r=.63 in 1998; r=.68 in 1999; r=.53 in 2000). Such variations of the statistics across 
years of testing may be due to: 

• Low generalization of writing scores across topics, the purposes of writing tasks, 
and occasions; 

• Variations in the characteristics of the reading passages and attached items from 
year to year and from grade to grade; or 

• Variations in writing skills among student populations from year to year. 

Limitations of the Study 

As indicated in the beginning of this report, the current study was designed and 
conducted based on the available data within a short period of time. Due to the 
limitations of the study, the author suggests caution in reviewing, interpreting, and using 
the results of this study. 

• Information, such as sampling procedures and students' scores on the field test, is 
not available for review and additional analysis. 

• Even though the sample of student text-based writings was randomly selected, the 

small sample size, only 1% of the grade population used in re- scoring, may not 

accurately reflect the characteristics of the population because of sampling errors. 
In addition, since the anchor papers were under review, the re- scoring process was 
conducted without using anchor papers. 

• It is very important to note that previous studies (Moon et al, 1996; Fitzpatrick et 

al, 1994; Dunbar, Kortez, Hoover, 1991; Canton and Hoover, 1986) have shown 
that the generalization of writing performance is low across the purpose (or 
discourse) of writing tasks, writing topics, and occasions, especially when there 
are only a couple of items used in the writing assessment. In 2000, a new text- 
based writing task was introduced at each grade level, which may be one of the 
reasons for the fluctuation of the test scores. For example, third graders 
responded to an informative writing task instead of a persuasive writing task. 
Similarly, the fifth graders responded to an informative writing task in 1998 and 

1999, but an expressive writing task in 2000. These changes could account for 

the low performance. 




10 



10 



• An important issue of educational measurement is reliability. Reliability of 
performance-based assessment, such as writing, is often defined by agreement of 
readers in scoring a single task given on a single occasion, called inter- reader 
reliability or inter- reader consistency. However, another component of reliability 
involves the consistency of measurement over repeated occasions given fixed 
readers is called score reliability. Findings from early studies (Moon et al, 1996; 
Dunbar, Kortez, Hoover, 1991; Canton and Hoover, 1986) suggest that reader 
consistency differed considerably ranging from .33 to .91 and score reliability 
ranged from .26 to .60, which was dependent upon the number cf points on the 
scoring scale, rating conditions, and changes in assessment programs (Dunbar et 
al, 1991; Fitzpatrick et al, 1994). The results of an experimental study conducted 
in Virginia (Moon et al, 1996) indicate that methods used for training and scoring 
(i.e., training readers to score multiple writing prompts at a single session or 
training readers to sequentially score writing prompts) impact both reliability and 
validity. They also found that readers scored differently using the same scoring 
method on the same set of students' papers across years. To better understand the 
nature of direct writing assessment and provide valid and reliable measures of 
student achievement, more research questions, such as the stability of scoring 
over time, the process of reader training and scoring, and score reliability across 
topics, discourses, and occasions, need to be further explored. 

Conclusion 

This study provides informative findings concerning student performance on direct 
writing assessment over time. The results of the statistical analysis help us better 
understand the characteristics of the text-based writing in large-scale assessment. 
Comments from the panels on test administration, scoring, and text- based writing 
development have been seriously reviewed and discussed by the Department of 
Education. The Text-Based Writing Subcommittee has recently made recommendations 
for the text- based writing assessment. As a result, the following changes are planned for 
the Spring 2001 writing assessment: 

• The range for the total writing scores will be 1-15. The rules for calculating the 
total scores for the writing section of the DSTP will be changed to decrease the 
number of invalid total writing scores- if a student receives a valid score on either 
the stand-alone or text-based prompt, a total writing score will be reported for that 
student. Previously, a student received a total writing score only if both the 
writing prompts had valid scores. 

• The two text- based writing tasks (one is for the field test) will be administered on 
different days. 

• A Prewriting sheet and scratch paper will be available for students to use as they 
plan their response to the text-based writing prompts. 

• The text- based writing prompts will be formatted closer to the stand-alone 
prompts (i.e., the text-based writing prompts will be presented in a "box"). 
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In addition, recommendations regarding the future construction of text- based prompts 
based on this study will be discussed and operationalized by the test development 
committees - the committees responsible for writing the stand-alone prompts and the 
text- based writing tasks. 
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Table la 

Comparisons of Writing Scores for 1998, 1999 and 2000 DSTP for Grade 3 
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Comparisons of Writing Scores for 1998, 1999 and 2000 DSTP for Grade 5 
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Comparisons of Writing Scores for 1998, 1999 and 2000 DSTP for Grade 10 
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Attachment A 
Writing Scoring Rubric 



Delaware Student Testing Program - General Rubric for Writing 

Delaware Student Testing Program - General Scoring Rubrics for Writing 
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Attachment B 
Correlation Matrix 
Between Reading and Writing 




29 



Correlation Matrix between Reading and Writing 



1999 


READING 


TEXT-BASED 


STAND-ALONE 


WRITING 


Grade 3 










READING 


1.00 








TEXT-BASED 


0.60 


1.00 






STAND-ALONE 


0.57 


0.45 


1.00 




WRITING TOTAL 


0.68 


0.77 


0.92 


1.00 


Grade 5 










Reading 


1.00 








Text-Based 


0.56 


1.00 






Prompt 


0.55 


0.46 


1.00 




Writing 


0.64 


0.79 


0.91 


1.00 


Grade 8 










Reading 


1.00 








Text-Based 


0.65 


1.00 






Prompt 


0.55 


0.49 


1.00 




Writing 


0.68 


0.80 


0.91 


1.00 


Grade 10 










Reading 


1.00 








Text-Based 


0.55 


1.00 






Prompt 


0.59 


0.47 


1.00 




Writing 


0.66 


0.75 


0.92 


1.00 


1998 


Reading 


Text-B 


Prompt 


Writing 


Grade 3 










Reading 


1.00 








Text-Based 


0.56 


LOO 






Prompt 


0.52 


0.44 


1.00 




Writing 


0.63 


0.77 


0.91 


1.00 


Grade 5 










Reading 


1.00 








Text-Based 


0.60 


1.00 






Prompt 


0.57 


0.45 


1.00 




Writing 


0.68 


0.79 


0.90 


1.00 


Grade 8 










Reading 


1.00 








Text-Based 


0.60 


1.00 






Prompt 


0.63 


0.54 


1.00 




Writing 


0.70 


0.80 


0.94 


1.00 


Grade 10 










Reading 


1.00 








Text-Based 


0.56 


1.00 






Prompt 


0.56 


0.47 


1.00 




Writing 


0.65 


0.81 


0.90 


1.00 



* All correlation coefficients are calculated based on aggregated data. 




30 




U.S. Department of Education 
Of nee of Educational Research and Improvement (OERI) 
National Ubrary of Education (NLEJ 
Educational Resources Information Center (ERIC) 




NOTICE 

RF.PWOPU^ TTON BASIS 



r-Tk This document is covered by a signed “Reproduction Release 
U (Blanket) form (on file within the ERIC system), encompassing all 
or classes of documents from its source organization and, therefore, 
does not require a “Specific Document” Release form. 



□ 



This document is Federally-funded/ or carries its own permission to 
reproduce, or is otherwise in the public domain and, therefore, may 
be reproduced by ERIC without a signed Reproduction Release lorm 
(either “Specific Document” or “Blanket”). 



o 

ERIC 



EFF-089 (9/97) 



